My name is Gilbert Permalloo. I am a Research Project Officer and I am presently working on roots architecture and rhizosphere of wheat. I worked in sugarcane agronomy for about 24 years and I was doing a little bit of basic programming in Fortran 77 and GWBasic about 30 years ago. Most of my data manipulation and visualisation are done in Excel. I could not write any code in R before I joined Data School and I was spending lots of time working with data in spreadsheets. On the otherhand, I am amazed to witness every day the marvel that R can do with data manipulation and visualisation.
The aim of this project is to investigate the use of portable X-Ray fluorescense spectrocopy (pXRF) as a rapid method to quantify the amount of phosphorus accumulated in straw and grains (Photo-1 shows pXRF used). About 200 grab samples were taken from one of three trials at 0kg and 30kg of phosphorus per hectare as treatments for this study. The straw and grains were ground, and scanned by the pXRF. Two large datasets were generated by the pXRF; the chemistry dataset is composed of a wide range of chemical elemental composition quantified in ppm, whereas the beamspectra, are spectral values from three X-ray beams. R has been used to clean, tidy and re-organised the data, as well as for graphical visualisation. Data for phosphorus have been extracted from the large pXRF generated-dataset and merged with a dataframe that contains unique identification numbers (SampleID) that links the data to the sample source (STEM_ID) and other spreadsheets that contain agronomical data for each sample.
Figures 1 and 2 show a relatively higher amount of phosphorus detected in the grains as compared to the straw. A slightly higher correlation between straw and grains for phosphorus assimilated under the 30kg/ha P treatment in comparison to the 0kg/ha P. Figure 2 shows that the level of P detected in the straw does not correlate with the grain yield. However, the amount of phosphorus detected in grains for the two treatments correlate differently to yield. Grains produced under the low treatment tend to show a slight positive correlation to yield as opposed to the 30kg/ha treatment, which has a negative correlation to yield.
data_straw_grain <- read_csv("clean_data/straw_grain_p.csv")
data_straw_grain_p <- data_straw_grain %>%
select(STEM_ID, GENOTYPE, straw_pconc, PAP.x, grain_pconc, yield) %>%
rename(`P in straw` = straw_pconc,`P in grain` = grain_pconc, `P level (kg/ha)` = PAP.x, `yield (kg)` = yield)
knitr::kable(head(data_straw_grain_p, n = 5,
col.names = c("STEM_ID", "`P in straw`", "PAP.x", "grain_pconc", "yield"), digits = 3),
format = "html",
caption = "Amount of phosphorus (ppm) detected in straw and grains for each genotype") %>%
kable_styling("striped")| STEM_ID | GENOTYPE | P in straw | P level (kg/ha) | P in grain | yield (kg) |
|---|---|---|---|---|---|
| ST50PKT0WD9S | CAV4081442 | 402 | 0 kg P | 2699 | 1.802632 |
| ST50PKT0WCYG | CAV4080777 | 433 | 0 kg P | 3587 | 2.854010 |
| ST50PKT0WD1B | CAV4081233 | 659 | 30 kg P | 4627 | 3.201220 |
| ST50PKT0WCD2 | CAV4080976 | 334 | 0 kg P | 3843 | 2.881579 |
| ST50PKT0WC08 | CAV4081051 | 430 | 30 kg P | 4440 | 3.059210 |
Photo-1: pXRF instrument used to quantify amount of phosphorus in straw and grains
straw_grain_p <- read_csv("clean_data/straw_grain_p.csv")
straw_grain <- ggplot(data = straw_grain_p,
mapping = aes(x = grain_pconc,
y = straw_pconc,
colour = PAP.x
)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "lm", size = 0.5, se = FALSE)
straw_grain +
labs(x = "Grains",
y = "Straw") +
geom_point(alpha = 0.2)Figure 1: Amount of phosphorus (ppm) in straw vs in grains
good_data <- read_csv("clean_data/good_data_york_pxrf.csv")
yield_Pconc <- ggplot(data = good_data,
mapping = aes(x = yield,
y = `P Concentration`,
colour = PAP,
shape = SUBSAMPLE
)) +
geom_point() +
geom_smooth(method = "lm", size = 0.5, se = FALSE)
yield_Pconc +
labs(x = "Yield (Kg)",
y = "P Concentration (ppm)") +
geom_point()Figure 2: Amount of phosphorus in straw and grains vs grain yield
For this project, I have been using the R version 3.6.1 and the digital tools tidyverse, ggplot2, kableExtra, imager, data.table, readxl, lubridate
Most of my time went into tidying up and cleaning the data. Then I realised how crucial it is to understand the “how” and “what” data to be collected, and the structure and formatting - not to forget how and where they are stored. I wrote codes in R to resolve the issues and bring them together in one clean data set that can be reused by anyone at anytime in the future.
I will continue to use R for data manipulation so that I improve my skills and build up my trust in good reusable data. I am looking forward to use R from preparing trial design, data manipulation, visualisation, analysing and to publishing.
My Data School experience has been a challenging, exciting and most of all very enriching. One of the challenges was to make a mind-shift to learn and adopt a new platform to manipulate data safely, explicitly and with a higher level of repeatability, that is far away from a risky and bad habit of doing all the data manipulation in spreadsheets. I really enjoyed everything but my favourite one is the vast options available to plot all sorts of graphs to convey the most important information. My team members are very excited to embark on such marvelous journey of R learning through Data School in the future.
Kerensa McElroy, Stephen Pearce, Nat Lui, Alex Whan, Neil Francis, Jen Taylor